Search | WHO COVID-19 Research Database

Video Indexing System Based on Multimodal Information Extraction Using Combination of ASR and OCR

Varma, S.; Pandey, A.; Shivam, Das, S.; Roy, S. D..

9th International Conference on Big Data Analytics, BDA 2021 ; 13167 LNCS:201-208, 2022.

Article in English | Scopus | ID: covidwho-1750588

ABSTRACT

With the ever-increasing internet penetration across the world, there has been a huge surge in the content on the worldwide web. Video has proven to be one of the most popular media. The COVID-19 pandemic has further pushed the envelope, forcing learners to turn to E-Learning platforms. In the absence of relevant descriptions of these videos, it becomes imperative to generate metadata based on the content of the video. In the current paper, an attempt has been made to index videos based on the visual and audio content of the video. The visual content is extracted using an Optical Character Recognition (OCR) on the stack of frames obtained from a video while the audio content is generated using an Automatic Speech Recognition (ASR). The OCR and ASR generated texts are combined to obtain the final description of the respective video. The dataset contains 400 videos spread across 4 genres. To quantify the accuracy of our descriptions, clustering is performed using the video description to discern between the genres of video. © 2022, Springer Nature Switzerland AG.

A Multimodal Misinformation Detector for COVID-19 Short Videos on TikTok

Shang, L.; Kou, Z.; Zhang, Y.; Wang, D..

2021 IEEE International Conference on Big Data, Big Data 2021 ; : 899-908, 2021.

Article in English | Scopus | ID: covidwho-1730897

ABSTRACT

This paper studies an emerging and important problem of identifying misleading COVID-19 short videos where the misleading content is jointly expressed in the visual, audio, and textual content of videos. Existing solutions for misleading video detection mainly focus on the authenticity of videos or audios against AI algorithms (e.g., deepfake) or video manipulation, and are insufficient to address our problem where most videos are user-generated and intentionally edited. Two critical challenges exist in solving our problem: i) how to effectively extract information from the distractive and manipulated visual content in TikTok videos? ii) How to efficiently aggregate heterogeneous information across different modalities in short videos? To address the above challenges, we develop TikTec, a multimodal misinformation detection framework that explicitly exploits the captions to accurately capture the key information from the distractive video content, and effectively learns the composed misinformation that is jointly conveyed by the visual and audio content. We evaluate TikTec on a real-world COVID- 19 video dataset collected from TikTok. Evaluation results show that TikTec achieves significant performance gains compared to state-of-the-art baselines in accurately detecting misleading COVID-19 short videos. © 2021 IEEE.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL